Learning Electron Densities in the Condensed Phase

We introduce a local machine-learning method for predicting the electron densities of periodic systems. The framework is based on a numerical, atom-centered auxiliary basis, which enables an accurate expansion of the all-electron density in a form suitable for learning isolated and periodic systems alike. We show that, using this formulation, the electron densities of metals, semiconductors, and molecular crystals can all be accurately predicted using symmetry-adapted Gaussian process regression models, properly adjusted for the nonorthogonal nature of the basis. These predicted densities enable the efficient calculation of electronic properties, which present errors on the order of tens of meV/atom when compared to ab initio density-functional calculations. We demonstrate the key power of this approach by using a model trained on ice unit cells containing only 4 water molecules to predict the electron densities of cells containing up to 512 molecules and see no increase in the magnitude of the errors of derived electronic properties when increasing the system size. Indeed, we find that these extrapolated derived energies are more accurate than those predicted using a direct machine-learning model. Finally, on heterogeneous data sets SALTED can predict electron densities with errors below 4%.


Calculation of the overlap matrix and vector of projections in periodic systems
In Eqs. 4 and 5 of the main text we define the overlap matrix of the basis functions, S, and the vector containing the projection of the density onto each basis function, w, in compact notation. We gave two equivalent forms of these objects -the "folded" representation, in which the domain of integration is restricted to a single unit cell onto which contributions from neighbouring unit cells are projected, and the "unfolded" representation in which the integral is performed over all space. Here we present those same expressions with all arguments made explicit: The equivalence of the two expressions for w i,σ can be concisely proven, beginning with the unfolded representation and dividing the integral over all space into an sum of integrals over every unit cell: dr φ i,σ (r − R i + T(0))ρ QM (r) = U=(0,0,0) drφ i,σ (r − R i + T(0))ρ QM (r) + U=(1,0,0) drφ i,σ (r − R i + T(0))ρ QM (r) + . . .
Here the subscript to the integral U indicates that the domain of integration is the unit cell translated by an integer multiple U = (U x , U y , U z ) of the lattice vectors from the central reference unit cell. The equivalence of the two representations of the overlap matrix can be proven in an analogous manner.
2 Error analysis of the electrostatic and Hartree energies The electrostatic energy is defined as where the sum over i runs over all atoms in the system with R i the position of atom i. The Hartree energy E H is defined as the first term of Eq. (S4), the electron-electron contribution to the electrostatic energy. To first order, errors in the electron density δρ(r) introduce the following errors to the electrostatic and Hartee energies: The error in the electrostatic energy can be re-written to simplify a direct comparison between the two errors: where V (r) is the electrostatic potential. This demonstrates that provided the contributions to the error from the electron-electron interactions and electron-nuclear attractions are of the same order, these two terms will screen one another, reducing the error in the electrostatic energy relative to the error in the Hartree error. This is in fact what we observe in the errors arising from the predicted densities described in Section IIIB of the main text. These are summarised in the last two columns Table. S1.
However, we observe the opposite trend in the errors arising from the RI densities described in Section IIIA -the errors in the electrostatic energy are significantly larger than those in the Hartree energy, as shown in the first two columns of Table. S1. This requires that: We can show that this condition is satisfied when the error in the density is dominated by Table S2: For the auxiliary basis constructed the using the overlap metric for orthogonalisation, the average error in the approximate RI density (¯ RI ρ ), along with the average error in exchange-correlation and electrostatic energies derived from it (¯ RI xc and¯ RI el ). These errors are relative to the QM reference values.¯ RI xc and¯ RI el are the "baselined" average errors, which remain after the mean error has been subtracted from each energy; this indicates the remaining error after the systematic error has been removed. All energies are reported in meV per atom, and are presented in comparison to those in Table I The left hand side will be dominated by terms where i = j, while the contributions to the integral on the right hand side will be largest when r = R j . For these contributions, the only difference between the two terms are the numerators: the nuclear charge of nucleus j Z j on the left compared with the electron density at nucleus j on the right. The former must be significantly larger than the latter, satisfying the condition in Eq. (S8). This justifies our assertion in Section IIIA of the main text that the large errors in the electrostatic energy derived from the RI densities of silicon and aluminium arise from an inaccurate description of the electron density near the nucleus.
We found that this systematic error could be significantly reduced by using an "alternative auxiliary basis", which is obtained by using an overlap metric when performing the Gram-Schmidt orthogonalisation required to eliminate linear dependencies in the auxiliary basis, rather than the Coulomb metric used to construct the "standard auxiliary basis" in FHI-aims, which has been used until this point. The average errors in the resulting RI densities are shown in Table S2, along with the average errors in the exchange-correlation and electrostatic Figure S1: For the auxiliary basis constructed the using the overlap metric for orthogonalisation, the distribution of the absolute errors in the exchange-correlation energy,¯ ML xc , arising from the predicted density of 20 randomly selected structures from the three datasets. energies derived from the RI density. By comparison to Table I in the main text, it is clear that this "alternative auxiliary basis" provides a superior RI density for aluminium and silicon. The RI density of the ice structures is largely unchanged, since changing the metric used in the orthogonalisation made very little difference to the resulting auxiliary basis.
However, we found that this using this alternative basis led to significantly less stable results when using the SALTED method. As an example of this, Figure S1 shows the distribution of the absolute errors in the exchange-correlation energy derived from the density predicted by the SALTED method for 20 structures from the three datasets using this alternative basis. There are at least two significant outliers in both the aluminium and silicon datasets; this stands in contrast to the equivalent results obtained using the standard auxiliary basis in FHI-aims, shown in the middle panel of Figure 3 in the main text, which contain no outliers at all.
When using the standard auxiliary basis, the errors introduced to the electrostatic energies of aluminium and silicon by the RI approximation are smaller than those introduced by the SALTED method, and so we find that this is still an acceptable basis. Nonetheless, it is important to note that these systematic errors can be reduced by a suitable change to the auxiliary basis functions, but that functions which produced an better RI density are not necessarily better suited to the SALTED machine learning algorithm.

SALTED hyper-paramaters for homogeneous datasets
When training a SALTED model, there are three hyper-parameters which must be optimised. Two of them are associated with the spatial extent and resolution of the smooth Gaussian density that enters the λ-SOAP representation of the atomic structure, namely the radial cutoff r c of the atomic environment and the broadening σ of the Gaussian functions.
The third parameter is the regularization η, defined following Eq. 8 in the main text, which modulates the smoothness of the model and therefore its accuracy in an out-of-sample prediction. In addition, the model must be converged with respect to the number of atomic environments M used in the sparse approximation of the density-coefficients in Eq. 7.
For each dataset, we first optimised the two SOAP parameters r c and σ simultaneously,  Table S3.
The reference densities used in the main text are calculated with standard "tight" settings within AIMS, using converged k-point grids. For all aluminium structures, as well as the smaller silicon structures, a (16 × 16 × 16) grid is used; for the larger silicon structures an (8 × 8 × 8) grid is used. For the 4-molecule ice systems a (4 × 4 × 4) grid is used, while the densities of the ice supercells are calculated at the Gamma point.

SALTED hyper-paramaters for heterogeneous datasets
To demonstrate the accuracy of SALTED when applied to heterogeneous datasets, first

Direct GPR hyper-parameters and learning curves
The optimal values of the hyperparameters used in the direct GPR predictions of the electrostatic and exchange-correlation energies are provided in Table S4.     Figure S9: The % RMSE in the density of a test set of 200 water molecules, as a function of the number of structures used in the training set N , for which the density has been expanded using the numerical atom-centred orbitals as described in the main text.
As is mentioned in the main text, the formalism presented here may be applied equally to periodic systems and isolated molecules. As an illustration of this, we show in Figure S9 the learning curve for a set of isolated water molecules, obtained using r c = 4Å, σ = 0.3Å, molecules. Figure S9 also shows the learning curve obtained for the same structures, but using a Gaussian basis set to both calculate the QM reference density (specifically the cc-pvdz basis) and to represent the RI and ML densities (using the RI-cc-pvdz basis). The optimal hyperparameters are found to be the same in both cases. The learning curve is broadly similar to that obtained using the numerical atom-centred orbitals of FHI-aims, although slightly less accurate.